Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values

نویسندگان

  • Mengdi Wang
  • Ethan X. Fang
  • Han Liu
چکیده

Classical stochastic gradient methods are well suited for minimizing expected-valued objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values, i.e., problems of the form minx f ( Ew[gw(x)] ) . In this paper, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on random sample gradients/subgradients of f, gw and use an auxiliary variable to track the unknown quantity E [gw(x)]. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the average iterates of SCGD achieve a convergence rate of O(k−1/4) in the general case and O(k−2/3) in the strongly convex case. For smooth convex problems, the SCGD can be accelerated to converge at a rate of O(k−2/7) in the general case and O(k−4/5) in the strongly convex case. For nonconvex optimization problems, we prove that the SCGD converges to a stationary point and provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize nonlinear functions of expected values using noisy samples is very common in practice. The proposed SCGD methods may find wide application in learning, estimation, dynamic programming, etc.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fastest Rates for Stochastic Mirror Descent Methods

Relative smoothness a notion introduced in [6] and recently rediscovered in [3, 18] generalizes the standard notion of smoothness typically used in the analysis of gradient type methods. In this work we are taking ideas from well studied field of stochastic convex optimization and using them in order to obtain faster algorithms for minimizing relatively smooth functions. We propose and analyze ...

متن کامل

Duality-free Methods for Stochastic Composition Optimization

We consider the composition optimization with two expected-value functions in the form of 1 n ∑n i=1 Fi( 1 m ∑m j=1Gj(x))+ R(x), which formulates many important problems in statistical learning and machine learning such as solving Bellman equations in reinforcement learning and nonlinear embedding. Full Gradient or classical stochastic gradient descent based optimization algorithms are unsuitab...

متن کامل

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...

متن کامل

Stochastic Smoothing for Nonsmooth Minimizations: Accelerating SGD by Exploiting Structure

In this work we consider the stochastic minimization of nonsmooth convex loss functions, a central problem in machine learning. We propose a novel algorithm called Accelerated Nonsmooth Stochastic Gradient Descent (ANSGD), which exploits the structure of common nonsmooth loss functions to achieve optimal convergence rates for a class of problems including SVMs. It is the first stochastic algori...

متن کامل

Stochastic Nonconvex Optimization with Large Minibatches

We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks. We propose stochastic approximation algorithms which optimize a series of regularized, nonlinearized losses on large minibatches of samples, using only first-order gradient information. Our algorithms provably converge to an approximate critical point of the expected objectiv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015